{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#| hide\n",
    "!pip install -Uqq nixtla fugue[dask]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#| hide \n",
    "from nixtla.utils import in_colab"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#| hide \n",
    "IN_COLAB = in_colab()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#| hide\n",
    "if not IN_COLAB:\n",
    "    from nixtla.utils import colab_badge\n",
    "    from dotenv import load_dotenv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Dask\n",
    "\n",
    "> Run TimeGPT distributedly on top of Dask"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Dask](https://www.dask.org/get-started) is an open source parallel computing library for Python. In this guide, we will explain how to use `TimeGPT` on top of Dask. \n",
    "\n",
    "**Outline:** \n",
    "\n",
    "1. [Installation](#installation)\n",
    "\n",
    "2. [Load Your Data](#load-your-data)\n",
    "\n",
    "3. [Import Dask](#import-dask) \n",
    "\n",
    "4. [Use TimeGPT on Dask](#use-timegpt-on-dask)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/17_computing_at_scale_dask_distributed.ipynb)"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#| echo: false\n",
    "if not IN_COLAB:\n",
    "    load_dotenv()\n",
    "    colab_badge('docs/tutorials/18_computing_at_scale_dask_distributed')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Installation "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Install Dask through [Fugue](https://fugue-tutorials.readthedocs.io/). Fugue provides an easy-to-use interface for distributed computing that lets users execute Python code on top of several distributed computing frameworks, including Dask. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "::: {.callout-note}\n",
    "You can install `fugue` with `pip`:\n",
    "    \n",
    "```shell\n",
    "pip install fugue[dask]\n",
    "```\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If executing on a distributed `Dask` cluster, ensure that the `nixtla` library is installed across all the workers."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Load Data "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can load your data as a `pandas` DataFrame. In this tutorial, we will use a dataset that contains hourly electricity prices from different markets. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>unique_id</th>\n",
       "      <th>ds</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>BE</td>\n",
       "      <td>2016-10-22 00:00:00</td>\n",
       "      <td>70.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>BE</td>\n",
       "      <td>2016-10-22 01:00:00</td>\n",
       "      <td>37.10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>BE</td>\n",
       "      <td>2016-10-22 02:00:00</td>\n",
       "      <td>37.10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>BE</td>\n",
       "      <td>2016-10-22 03:00:00</td>\n",
       "      <td>44.75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>BE</td>\n",
       "      <td>2016-10-22 04:00:00</td>\n",
       "      <td>37.10</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  unique_id                   ds      y\n",
       "0        BE  2016-10-22 00:00:00  70.00\n",
       "1        BE  2016-10-22 01:00:00  37.10\n",
       "2        BE  2016-10-22 02:00:00  37.10\n",
       "3        BE  2016-10-22 03:00:00  44.75\n",
       "4        BE  2016-10-22 04:00:00  37.10"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv(\n",
    "    'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv',\n",
    "    parse_dates=['ds'],\n",
    ") \n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Import Dask"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import Dask and convert the `pandas` DataFrame to a Dask DataFrame. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import dask.dataframe as dd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><strong>Dask DataFrame Structure:</strong></div>\n",
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>unique_id</th>\n",
       "      <th>ds</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>npartitions=2</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>string</td>\n",
       "      <td>string</td>\n",
       "      <td>float64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4200</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8399</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>\n",
       "<div>Dask Name: to_pyarrow_string, 2 graph layers</div>"
      ],
      "text/plain": [
       "Dask DataFrame Structure:\n",
       "              unique_id      ds        y\n",
       "npartitions=2                           \n",
       "0                string  string  float64\n",
       "4200                ...     ...      ...\n",
       "8399                ...     ...      ...\n",
       "Dask Name: to_pyarrow_string, 2 graph layers"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dask_df = dd.from_pandas(df, npartitions=2)\n",
    "dask_df "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Use TimeGPT on Dask "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using `TimeGPT` on top of `Dask` is almost identical to the non-distributed case. The only difference is that you need to use a `Dask` DataFrame, which we already defined in the previous step. \n",
    "\n",
    "First, instantiate the `NixtlaClient` class. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from nixtla import NixtlaClient"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nixtla_client = NixtlaClient(\n",
    "    # defaults to os.environ.get(\"NIXTLA_API_KEY\")\n",
    "    api_key = 'my_api_key_provided_by_nixtla'\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 👍 Use an Azure AI endpoint\n",
    ">\n",
    "> To use an Azure AI endpoint, set the `base_url` argument:\n",
    ">\n",
    "> `nixtla_client = NixtlaClient(base_url=\"you azure ai endpoint\", api_key=\"your api_key\")`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#| hide \n",
    "if not IN_COLAB:\n",
    "    nixtla_client = NixtlaClient()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then use any method from the `NixtlaClient` class such as [`forecast`](https://nixtlaverse.nixtla.io/nixtla/nixtla_client.html#nixtlaclient-forecast) or [`cross_validation`](https://nixtlaverse.nixtla.io/nixtla/nixtla_client.html#nixtlaclient-cross-validation)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>unique_id</th>\n",
       "      <th>ds</th>\n",
       "      <th>TimeGPT</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>BE</td>\n",
       "      <td>2016-12-31 00:00:00</td>\n",
       "      <td>45.190453</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>BE</td>\n",
       "      <td>2016-12-31 01:00:00</td>\n",
       "      <td>43.244446</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>BE</td>\n",
       "      <td>2016-12-31 02:00:00</td>\n",
       "      <td>41.958389</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>BE</td>\n",
       "      <td>2016-12-31 03:00:00</td>\n",
       "      <td>39.796486</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>BE</td>\n",
       "      <td>2016-12-31 04:00:00</td>\n",
       "      <td>39.204533</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  unique_id                   ds    TimeGPT\n",
       "0        BE  2016-12-31 00:00:00  45.190453\n",
       "1        BE  2016-12-31 01:00:00  43.244446\n",
       "2        BE  2016-12-31 02:00:00  41.958389\n",
       "3        BE  2016-12-31 03:00:00  39.796486\n",
       "4        BE  2016-12-31 04:00:00  39.204533"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fcst_df = nixtla_client.forecast(dask_df, h=12)\n",
    "fcst_df.compute().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 📘 Available models in Azure AI\n",
    ">\n",
    "> If you are using an Azure AI endpoint, please be sure to set `model=\"azureai\"`:\n",
    ">\n",
    "> `nixtla_client.forecast(..., model=\"azureai\")`\n",
    "> \n",
    "> For the public API, we support two models: `timegpt-1` and `timegpt-1-long-horizon`. \n",
    "> \n",
    "> By default, `timegpt-1` is used. Please see [this tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting) on how and when to use `timegpt-1-long-horizon`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>unique_id</th>\n",
       "      <th>ds</th>\n",
       "      <th>cutoff</th>\n",
       "      <th>TimeGPT</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>BE</td>\n",
       "      <td>2016-12-30 04:00:00</td>\n",
       "      <td>2016-12-30 03:00:00</td>\n",
       "      <td>39.375439</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>BE</td>\n",
       "      <td>2016-12-30 05:00:00</td>\n",
       "      <td>2016-12-30 03:00:00</td>\n",
       "      <td>40.039215</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>BE</td>\n",
       "      <td>2016-12-30 06:00:00</td>\n",
       "      <td>2016-12-30 03:00:00</td>\n",
       "      <td>43.455849</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>BE</td>\n",
       "      <td>2016-12-30 07:00:00</td>\n",
       "      <td>2016-12-30 03:00:00</td>\n",
       "      <td>47.716408</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>BE</td>\n",
       "      <td>2016-12-30 08:00:00</td>\n",
       "      <td>2016-12-30 03:00:00</td>\n",
       "      <td>50.31665</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  unique_id                   ds               cutoff    TimeGPT\n",
       "0        BE  2016-12-30 04:00:00  2016-12-30 03:00:00  39.375439\n",
       "1        BE  2016-12-30 05:00:00  2016-12-30 03:00:00  40.039215\n",
       "2        BE  2016-12-30 06:00:00  2016-12-30 03:00:00  43.455849\n",
       "3        BE  2016-12-30 07:00:00  2016-12-30 03:00:00  47.716408\n",
       "4        BE  2016-12-30 08:00:00  2016-12-30 03:00:00   50.31665"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cv_df = nixtla_client.cross_validation(dask_df, h=12, n_windows=5, step_size=2)\n",
    "cv_df.compute().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also use exogenous variables with `TimeGPT` on top of `Dask`. To do this, please refer to the [Exogenous Variables](https://docs.nixtla.io/docs/tutorials-exogenous_variables) tutorial. Just keep in mind that instead of using a pandas DataFrame, you need to use a `Dask` DataFrame instead."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "python3",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
